# Note: Data is paired
cc <- read_csv("data/candy_corn.csv") |>
arrange(participant)đ PSYCH 413 Midterm Exam
Due Wednesday Oct 22nd by 15:30 p.m.
General
- Please submit your responses as a nicely formatted notebook (.ipynb) file.
- If your answer includes many decimal places (e.g., 3.14159265358979), please round it to a reasonable number of decimals for readability (typically 3 to 4).
- For ease of marking, avoid showing the outputs of unnecessary values.
- Make sure your code runs without errors and shows all required outputs.
- Good coding style mattersâ clean, organized, and well-commented code will be rewarded. Disorganized, redundant, or poorly structured code may lose marks.
Requirements:
- Unless otherwise specified, set any trimming you need to do at 20% and use an \(\alpha = 0.05\).
- Unless the instructions explicitly state otherwise (e.g., âassume the data are normally distributedâ), you are responsible for checking whether the assumptions of the method are reasonable and robust approaches need to be used.
- If classical test assumptions are not violated, use the classical test.
Plots, Packages, and Functions:
- All plots must be made with ggplot2 and include clear, descriptive axis titles rather than default column names.
- Only the following packages are permitted:
tidyverseWRS2
- Unless stated otherwise, The following functions are not permitted:
IQRquantile()mad()t.test()yuen()trimse()
Question 1
Figure 1. Candy Corn (Confectio tricoloris)
You hypothesize that candy cornâa small (approx. 0.5g), triangular Halloween candy with white, orange, and yellow layers (see Figure 1)â is only enjoyable in small doses: after a few pieces, people slow down their consumption.
You record how long (in seconds) it takes each participant to eat five pieces of candy corn at two doses: 1) before eating any pieces, and 2) after eating 20 peices. The data is in candy_corn.csv.
Based on the results, is there evidence that participantsâ candy corn eating speed changes after theyâve already eaten several pieces? Conduct an appropriate statistical test that evaluates whether or not your hypothesis should be accepted.
Report the following:
- The null & alternative hypothesis.
- Test-statistic
- Degrees of Freedom
- P-value
- Confidence Interval
- Conclusion
\(H_0\): \(\mu_\text{large} - \mu_\text{small} \leq 0\)
\(H_1\): \(\mu_\text{large} - \mu_\text{small} > 0\)
# Difference Scores
diffs <- cc$dur[cc$dose == "large"] - cc$dur[cc$dose == "small"]
# Check normality
ggplot(mapping = aes(sample = diffs)) +
stat_qq(size = 3) +
stat_qq_line() +
labs(
x = "Theoretical Quantiles",
y = "Sample Quantiles"
)Data is sufficiently normal looking.
# Sum stats
N <- length(diffs)
m <- mean(diffs)
se <- sd(diffs) / sqrt(N)
# Test Statistic
df <- N - 1
mu <- 0
t <- (m - mu) / se
p <- pt(t, df = df, lower.tail = FALSE)
# Confidence Interval
alpha <- 0.05
t_crit <- abs(qt(alpha, df = df))
low_ci <- m - t_crit * se
top_ci <- m + Inf
# Results
paste0("Test Stat. = ", round(t, 5))
paste0("df = ", df)
paste0("p = ", round(p, 5))
paste0("95% CI = [", round(low_ci, 5), ", ", top_ci, ")")[1] "Test Stat. = 44.33929"
[1] "df = 129"
[1] "p = 0"
[1] "95% CI = [9.10037, Inf)"
Since \(p < 0.05\) and / or \(0\) is not contained in the confidence interval, we reject the null hypothesis and accept the alternative.
Question 2
Every Halloween, your instructor grows a patch of pumpkins to use as lecture props.
This year, he recorded the weights (in kilograms) of every pumpkin grown in the patch and plotted them in the histogram below. Use the histogram to estimate what \(\mu\) and \(\sigma\) are.
The histogram is divided into bins covering a 1-kg range. The lower bound of each range is inclusive, and the upper bound is exclusive (e.g., the first bin spans 1 †x < 2).
Note: since the official \(\mu\) and \(\sigma\) canât be known exactly, the histogram just allows us to calculate an estimate of these numbers. There two reasonable ways to think about this.
Option 1: Treat each bin as a \(x\) kg size pumpkin.
x <- 1:13
w <- c(1, 12, 14, 9, 8, 18, 5, 3, 2, 3, 3, 0, 2)
# Mu
m <- sum(x * w) / sum(w)
m[1] 5.2125
Option 2: Treat each bin as being the midpoint of the binâs range. This is a little bit more precise.
x <- 1:13 + 0.5
w <- c(1, 12, 14, 9, 8, 18, 5, 3, 2, 3, 3, 0, 2)
# Mu
m <- sum(x * w) / sum(w)
m[1] 5.7125
\(\sigma\) should result in the same number for both methods.
# Sigma
values <- rep(x, w)
N <- length(values)
s <- sqrt(sum((values - m)^2) / N)
s[1] 2.72348
Question 3
Building on the previous question, use a robust method of outlier detection to uncover any âunusually spookyâ pumpkin weights:
- Identify which weights are classified as outliers.
- Report how many pumpkins correspond to each outlier weight.
med <- median(values)
madn <- median(abs(values - med)) / 0.6745
outs <- abs(values - med) / madn > 2.24
table(values[outs])
13.5
2
Two 13-kg pumpkins are outliers.
Question 4
In the 1930s, Joseph Rhine at Duke University conducted famous âexperimentsâ on extrasensory perception (ESP) using Zener cards â a deck of 25 cards with five repeating symbols (circle, star, wavy lines, square, cross). Participants tried to predict the symbol on each unseen card.
In a modern replication of Rhineâs work, over 1000 participants (N = 1329) attempted to predict 25 Zener cards each on an online version of the task.
The number of correct guesses follows an approximately normal distribution with a \(\bar{x} = 5\) and a \(s = 2\), which corresponds to chance expectations.
a.
Dr. Rhineâs team classifies anyone in the top 5% of scores as a potential âpsychic.â
What is the minimum number of correct guesses required to be in this top 5%?
ceiling(qnorm(0.95, mean = 5, sd = 2))[1] 9
Since a prediction cannot be be partially right or wrong, 9 is the minimum.
If you rounded down to 8 that puts you below the 5% threshold (hence why you need to round up).
pnorm(8, mean = 5, sd = 2)[1] 0.9331928
b.
The Amazing Criswell, the flamboyant psychic who once warned that âthe future is where we will spend the rest of our lives,â scored 2.5 standard deviations below the mean. What was his score?
# Rearrange the formula for a z-score
m <- 5
s <- 2
z <- -2.5
z * s + m[1] 0
c.
Miss Cleo, a famous late-night television psychic, correctly identified 9 cards on the task. What percentile does her score correspond to and why is problematic to refer to her as psychic given her results in this study?
pnorm(9, mean = 5, sd = 2) * 100[1] 97.72499
Miss Cleoâs score lies around the 98th percentile of the distribution, suggesting her performance is statistically unusual â only about 2% of people would reach or exceed this level if guessing randomly. However, an isolated high score could still occur by coincidence, so while her result is improbable, it does not necessarily provide strong evidence for psychic ability without replication.
Question 5
Researchers studying Halloween happiness propose that dopamine (DA) levels change when people eat candyâbut they wonder whether this effect depends on whether the candy is their favourite or not.
To test this, they have participants wear a portable dopamine sensor. In one treatment participants eat a mini chocolate bar from their personal top-three favourite candies. In another treatment they eat a random candy drawn from a generic Halloween mix (e.g., candy corn, licorice, or a mystery taffy). Dopamine change in nanomoles per liter (nmol/L) is recorded for 10 minutes after each treat.
The data can be found in dopamine.csv.
Do these results provide convincing evidence that dopamine levels change after eating favourite candy than random Halloween candy? Clearly state the null and alternative hypotheses, perform an appropriate statistical test.
Report the following:
- The null & alternative hypothesis.
- Test-statistic
- Degrees of Freedom
- P-value
- Confidence Interval
- Conclusion
candy <- read_csv("data/dopamine.csv") |>
arrange(participant)
# Note data contains independent groups.\(H_0\): \(\mu_{_\text{fave}} - \mu_{_\text{rand}} = 0\)
\(H_1\): \(\mu_{_\text{fave}} - \mu_{_\text{rand}} \neq 0\)
# Check normality
ggplot(candy, aes(sample = DA)) +
stat_qq() +
stat_qq_line() +
facet_wrap(~ candy, scales = "free") +
labs(
x = "Theoretical Quantiles",
y = "Sample Quantiles"
)Data (in the fave treatment) is non-normal (and groups are independent). A trimmed/yuenâs t-test is necessary.
library(WRS2)
# Summary Stats
stats <- candy |>
group_by(candy) |>
summarise(
n = length(DA),
G = 0.2,
m = mean(DA, tr = G),
v = winvar(DA, tr = G),
h = n - 2 * floor(G * n),
d = ((n - 1) * v) / (h * (h - 1))
)
# Standard Error of the trimmed mean difference
se <- sqrt(sum(stats$d))
# Degrees of freedom
df <- (sum(stats$d)^2) / sum((stats$d^2) / (stats$h - 1))
# P-value
mu <- 0
t <- ((stats$m[1] - stats$m[2]) - mu) / se
p <- pt(t, df = df, lower.tail = FALSE) * 2
# Confidence Interval
alpha <- 0.05
t_crit <- abs(qt(alpha / 2, df = df))
low_ci <- (stats$m[1] - stats$m[2]) - t_crit * se
top_ci <- (stats$m[1] - stats$m[2]) + t_crit * se
# Results
paste0("Test Stat. = ", round(t, 5))
paste0("df = ", round(df, 5))
paste0("p = ", round(p, 5))
paste0("95% CI = [", round(low_ci, 5), ", ", round(top_ci, 5), "]")[1] "Test Stat. = 0.60709"
[1] "df = 153.84398"
[1] "p = 0.54469"
[1] "95% CI = [-0.28864, 0.54474]"
Since \(p > 0.05\) and / or \(0\) is contained in the confidence interval, we FAIL to reject the null hypothesis.
Given that the alternative canât be accepted, the data does not provide convincing evidence of a change in dopamine.
Question 6
Plot a bar graph of the previous questionâs results such that the signficance of the test can be evaluated. Use confidence intervals for the error bars.
Display a dataframe that includes only the mean and the lower and upper bounds of the interval for each treatment. Do not include any additional statistics in the output.
# Summary Stats
plot_data <- candy |>
group_by(candy) |>
summarise(
n = length(DA),
G = 0.2,
m = mean(DA, tr = G),
s = sqrt(winvar(DA, tr = G)),
se = s / ((1 - 2 * G) * sqrt(n)),
h = n - 2 * floor(G * n),
df = h - 1,
alpha = 0.05,
t_crit = abs(qt(alpha / 2, df = df)),
low_ci = m - t_crit * se,
top_ci = m + t_crit * se
) |>
select(candy, m, low_ci, top_ci)
plot_data# A tibble: 2 Ă 4
candy m low_ci top_ci
<chr> <dbl> <dbl> <dbl>
1 Favourite 5.130772 4.816412 5.445132
2 Random 5.002720 4.724083 5.281357
ggplot(plot_data, aes(x = candy, y = m)) +
geom_bar(stat = "identity", colour = "black", fill = "#9B33AE") +
geom_errorbar(aes(ymin = low_ci, ymax = top_ci), width = 0.25) +
labs(
x = "Candy",
y = "Dopamine (nmol/L, tr = 0.2)"
)Question 7
A psychologist collects data on âtime (in seconds) spent staring at a ghost illusionâ for 12 participants and plans to test whether the mean time differs from 5 s. A histogram of the results shows one participant spent 50 s staring, while all others are between 3â8 s. i.e., there is one clear outlier.
a.
Would this extreme value affect an assumption of normality? If so, how?
Since neither the mean or standard deviation are robust to extreme values, the mean is likely to be pulled to a location unrepresentative of the dataâs true centre, positively skewing the data, and the spread is likely to be inflated, hurting the dataâs bell-shaped symmetry.
b.
Would a classic t-test be appropriate (why or why not)? If not, how should the outlier be dealt with?
No, use of a t-distribution at small samples requires the underlying population, and by extension the sample (since that is our only snapshot of the population), to be normally distributed. This is especially true at small samples. Since the outlier hurts the normality assumption, a robust version of the test needs to be employed.
Question 8
a.
A university student named Frank Stein scores 90% on an exam. Assume the results follow a normal distribution with a mean of 95% and a standard deviation of 3%. Should this student be proud of their score?
z <- (90 - 95) / 3
pnorm(z) * 100[1] 4.779035
No, they are in approximately the bottom 5% of their class. i.e., their performance is very low even if the absolute number looks high.
If you wrote that Frank could be both proud and not proud of his mark, you lost marks. Thatâs a contradictory response, and it doesnât actually answer the question of whether he should be proud.
If you argued that he should be proud, you needed to have explained whyâfor example, by noting that perhaps previous exams in that class were all scored quite low.
b.
Suppose another exam given by a different instructor teaching the same course has a much lower average (e.g., 85%) but similar variability. Could the same raw score of 90% indicate a very different level of performance? Why or why not?
z <- (90 - 85) / 3
pnorm(z) * 100[1] 95.22096
Yes, 90% would be far above average (top 5%) on an exam with a mean of 85%.
c.
At the University of Alberta, grades are formally defined according to the following descriptors:
- A± = Excellent
- B± = Good
- C± = Satisfactory
- D+ = Poor
- D = Minimal Pass
- F = Failure
Instructors are responsible for converting numerical performance into letter grades. Based on the Universityâs official grade descriptors, what letter grade do you think would be fair to assign to Frank on the first exam (from question âaâ)? There is no single correct answer, but you should justify your reasoning.
A Possible Answer:
There are twelve letter grades in total, placing B- and C+ in the centre. Given their position and accompanying descriptors, itâs reasonable to interpret these as representing average performance. Consequently, Frankâs grade must fall below a C+, since his performance is clearly well below average.
If we further assume that failing a course should be relatively uncommon at the university level, it would make sense to set the cutoff for an F around the 5th percentileâafter all, a probability of 0.05 is generally considered quite rare. Because Frankâs score places him within the bottom 5% of the class, assigning an F would be justified.
This reasoning naturally assumes that the letter-grade descriptors are intended to reflect performance relative to the cohort (as the University of Alberta specifies), rather than simply indicating whether a student has learned the necessary material.
Marks were lost if your answer didnât give a single precise letter grade or involved statements that were âŠ
- demonstrably false (e.g., Frank scored just below the average)
- unknowable (e.g., Frank passed the exam)
- unjustified (e.g., saying 90% is still a decent score with no reasoning provided)
Question 9
A recent article in the prestigious journal of Techno-Terrors and Teen Minds has raised concern that excessive smartphone use is transforming young adults into âdigital zombies.â According to one particularly chilling report, members of Gen Z (ages 13â28) spend an average of 6 hours per day staring at their glowing screensâroughly the same amount of time classic movie zombies spend wandering aimlessly.
You decide to test whether this eerie claim holds true. You obtain a random sample of daily screen-time data from a group of Gen Z aged users (see zombie_screentime.csv).
Conduct an analysis to evaluate this, do you believe the âdigital zombieâ claim is exaggerated or plausible?
You are allowed to use any R function(s) you would like for this question (though you are not permitted to install other packages beyond what is stated in the instructions).
# Load data
zombie <- read_csv("data/zombie_screentime.csv")
mu <- 6
# Check normality
ggplot(zombie, aes(sample = time_hr)) +
stat_qq(size = 2) +
stat_qq_line() +
labs(
x = "Theoretical Quantiles",
y = "Sample Quantiles"
)# T-test
t.test(zombie$time_hr, mu = mu)
One Sample t-test
data: zombie$time_hr
t = 3.2296, df = 388, p-value = 0.001345
alternative hypothesis: true mean is not equal to 6
95 percent confidence interval:
6.035639 6.146553
sample estimates:
mean of x
6.091096
# Effect size
d <- (mean(zombie$time_hr) - mu) / sd(zombie$time_hr)
d[1] 0.1637471
The estimated mean population screentime is approximately 6.09 hours, while the report claims it to be 6 hours. This represents a difference of about 5 minutes. Although this difference is statistically significant, it is unlikely to be practically meaningful for most people (i.e., the effect is small as confirmed by Cohenâs d). Therefore, the reportâs claim is not unreasonable.
If you interpreted the the term âexaggeratedâ in the question to mean âgreater thanâ and consequently used a one-tailed hypothesis, that was taken into account in the scoring.
Question 10
a.
A neuroscientist records brain temperature (in Celsius) before and after a task. What scale of measurement does this involve?
- Interval
b.
Participants are asked to rank a series of images based on how emotionally disturbing they find them, from least to most disturbing. What scale of measurement is this?
- Ordinal
c.
Suppose you record the number of times a neuron fires per second in response to a stimulus. What scale of measurement does this represent?
- Ratio
d.Â
What scale of measurement would differences between rates of neuron firing be?
- Ratio
e.
You measure the amount of dopamine (in nanograms per milliliter) present in a ratâs nucleus accumbens after exposure to a drug. What scale of measurement is this?
- Ratio
f.
You group EEG data based on whether the participant was in a resting, task, or sleep condition. What scale of measurement is this?
- Nominal
g.
A researcher counts how many milliseconds it takes participants to identify an image of a fearful face. What scale of measurement is used?
- Ratio
h.
Subjects rate their pain during a mild electric shock using a 5-point scale from âno painâ to âextreme painâ. What kind of measurement scale is this?
- Ordinal
i.
In a memory experiment, participants are categorized based on the brain region that showed the most activation (e.g., hippocampus, amygdala, prefrontal cortex). What scale of measurement does this reflect?
- Nominal